Opinion and Generic Question Answering Systems: a Performance Analysis
نویسندگان
چکیده
The importance of the new textual genres such as blogs or forum entries is growing in parallel with the evolution of the Social Web. This paper presents two corpora of blog posts in English and in Spanish, annotated according to the EmotiBlog annotation scheme. Furthermore, we created 20 factual and opinionated questions for each language and also the Gold Standard for their answers in the corpus. The purpose of our work is to study the challenges involved in a mixed fact and opinion question answering setting by comparing the performance of two Question Answering (QA) systems as far as mixed opinion and factual setting is concerned. The first one is open domain, while the second one is opinionoriented. We evaluate separately the two systems in both languages and propose possible solutions to improve QA systems that have to process mixed questions. Introduction and motivation In the last few years, the number of blogs has grown exponentially. Thus, the Web contains more and more subjective texts. A research from the Pew Institute shows that 75.000 blogs are created daily (Pang and Lee, 2008). They approach a great variety of topics (computer science, sociology, political science or economics) and are written by different types of people, thus are a relevant resource for large community behavior analysis. Due to the high volume of data contained in blogs, new Natural Language Processing (NLP) resources, tools and methods are needed in order to manage their language understanding. Our fist contribution consists in carrying out a multilingual research, for English and Spanish. Secondly, many sources are present in blogs, as people introduce quotes from newspaper articles or other information to support their arguments and make references to previous posts in the discussion thread. Thus, when performing a task such as Question Answering (QA), many new aspects have to be taken into consideration. Previous studies in the field (Stoyanov, Cardie and Wiebe, 2005) showed that certain types of queries, which are factual in nature, require the use of Opinion Mining (OM) resources and techniques to retrieve the correct answers. A further contribution this paper brings is the analysis and definition of the criteria for the discrimination among types of factual versus opinionated questions. Previous researchers mainly concentrated on newspaper collections. We formulated and annotated of a set of questions and answers over a multilingual blog collection. A further contribution is the evaluation and comparison of two different approaches to QA a fact-oriented one and another designed for opinion QA scenarios.
منابع مشابه
A Comparative Study of Open Domain and Opinion Question Answering Systems for Factual and Opinionated Queries
The development of the Web 2.0 led to the birth of new textual genres such as blogs, reviews or forum entries. The increasing number of such texts and the highly diverse topics they discuss make blogs a rich source for analysis. This paper presents a comparative study on open domain and opinion QA systems. A collection of opinion and mixed fact-opinion questions in English is defined and two Qu...
متن کاملOptimizing question answering systems by Accelerated Particle Swarm Optimization (APSO)
One of the most important research areas in natural language processing is Question Answering Systems (QASs). Existing search engines, with Google at the top, have many remarkable capabilities. But there is a basic limitation (search engines do not have deduction capability), a capability which a QAS is expected to have. In this perspective, a search engine may be viewed as a semi-mechanized QA...
متن کاملQuestion Analysis and Answer Passage Retrieval for Opinion Question Answering Systems
Question answering systems provide an elegant way for people to access an underlying knowledge base. Humans are not only interested in factual questions but also interested in opinions. This paper deals with question analysis and answer passage retrieval in opinion QA systems. For question analysis, six opinion question types are defined. A two-layered framework utilizing two question type clas...
متن کاملDevelopment of a Generic Risk Matrix to Manage Project Risks
A generic risk matrix is presented for use identifying and assessing project risks quickly and cost effectively. It assists project managers with few resources to perform project risk analysis. The generic risk matrix (GRM) contains a broad set of risks that are categorized and ranked according to their potential impact and probability of occurrence. The matrix assists PMs in quickly identifyin...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009